Goto

Collaborating Authors

 knowledge uncertainty


ALoss Derivation In this section we provide a more detailed derivation of the proposed loss function (Equation 17)

Neural Information Processing Systems

In this section we provide a more detailed derivation of the proposed loss function (Equation 17). We make use of the fact that the negative entropy of the Dirichlet distribution is equivalent to the reverse KL-divergence to a flat Dirichlet, up to an additive constant which doesn't depend on the model. Additionally, we can see that by adding +1 to the target concentration parameters ห†, we are now minimizing an upper bound to the KL-divergence between the mean and the ensemble. Then we divide through by ห† 0 and drop the additive constant. This yields a loss which is remarkable similar to an ELBO.



A Loss Derivation

Neural Information Processing Systems

In this section we provide a more detailed derivation of the proposed loss function (Equation 17). We make use of the fact that the negative entropy of the Dirichlet distribution is equivalent to the reverse KL-divergence to a flat Dirichlet, up to an additive constant which doesn't depend on the We resolved this by using a single LayerNorm layer just before the final output layer. We suspect that a more numerically stable implementation of the loss would not require LayerNorm. Additionally, we examined the models' median precisions ( Let's examine how to emulate an ensemble of auto-regressive models using Prior Networks. Measures of Uncertainty Let's examine how given this model we can obtain measures of sequence-level total and knowledge uncertainty.



Distil the informative essence of loop detector data set: Is network-level traffic forecasting hungry for more data?

arXiv.org Artificial Intelligence

Network-level traffic condition forecasting has been intensively studied for decades. Although prediction accuracy has been continuously improved with emerging deep learning models and ever-expanding traffic data, traffic forecasting still faces many challenges in practice. These challenges include the robustness of data-driven models, the inherent unpredictability of traffic dynamics, and whether further improvement of traffic forecasting requires more sensor data. In this paper, we focus on this latter question and particularly on data from loop detectors. To answer this, we propose an uncertainty-aware traffic forecasting framework to explore how many samples of loop data are truly effective for training forecasting models. Firstly, the model design combines traffic flow theory with graph neural networks, ensuring the robustness of prediction and uncertainty quantification. Secondly, evidential learning is employed to quantify different sources of uncertainty in a single pass. The estimated uncertainty is used to "distil" the essence of the dataset that sufficiently covers the information content. Results from a case study of a highway network around Amsterdam show that, from 2018 to 2021, more than 80\% of the data during daytime can be removed. The remaining 20\% samples have equal prediction power for training models. This result suggests that indeed large traffic datasets can be subdivided into significantly smaller but equally informative datasets. From these findings, we conclude that the proposed methodology proves valuable in evaluating large traffic datasets' true information content. Further extensions, such as extracting smaller, spatially non-redundant datasets, are possible with this method.


CAMELL: Confidence-based Acquisition Model for Efficient Self-supervised Active Learning with Label Validation

arXiv.org Artificial Intelligence

Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present \textbf{CAMELL} (Confidence-based Acquisition Model for Efficient self-supervised active Learning with Label validation), a pool-based active learning framework tailored for sequential multi-output problems. CAMELL possesses three core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, (2) it facilitates self-supervision for the remainder of the sequence, and (3) it employs a label validation mechanism to prevent erroneous labels from contaminating the dataset and harming model performance. We evaluate CAMELL on sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMELL outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.


Understanding Noisy Data and Uncertainty in Machine Learning

#artificialintelligence

The fields of Artificial Intelligence and Machine Learning are hotter than ever. With models like Chat GPT and Stable Diffusion taking the world by storm, AI and ML hype has made a resurgence and is catching the attention of the masses. With all of this hype, it's important to remind ourselves who the governor of machine learning success is -- high-quality data. In the absence of quality training data, supervised machine learning models offer no utility. Unfortunately, most real-world data science projects fail because unrealistic expectations about model performance are made before the quality of the data source is fully understood.


Self-Distribution Distillation: Efficient Uncertainty Estimation

arXiv.org Machine Learning

Deep learning is increasingly being applied in safety-critical domains. For these scenarios it is important to know the level of uncertainty in a model's prediction to ensure appropriate decisions are made by the system. Deep ensembles are the de-facto standard approach to obtaining various measures of uncertainty. However, ensembles often significantly increase the resources required in the training and/or deployment phases. Approaches have been developed that typically address the costs in one of these phases. In this work we propose a novel training approach, self-distribution distillation (S2D), which is able to efficiently train a single model that can estimate uncertainties. Furthermore it is possible to build ensembles of these models and apply hierarchical ensemble distillation approaches. Experiments on CIFAR-100 showed that S2D models outperformed standard models and Monte-Carlo dropout. Additional out-of-distribution detection experiments on LSUN, Tiny ImageNet, SVHN showed that even a standard deep ensemble can be outperformed using S2D based ensembles and novel distilled models.


Uncertainty in Gradient Boosting via Ensembles

arXiv.org Machine Learning

Gradient boosting is a powerful machine learning technique that is particularly successful for tasks containing heterogeneous features and noisy data. While gradient boosting classification models return a distribution over class labels, regressions models typically yield only point predictions. However, for many practical, high-risk applications, it is also important to be able to quantify uncertainty in the predictions to avoid costly mistakes. In this work, we examine a probabilistic ensemble-based framework for deriving uncertainty estimates in the predictions of gradient boosting classification and regression models. Crucially, the proposed approach allows the total uncertainty to be decomposed into \textit{data uncertainty}, which comes from the complexity and noise in data distribution, and \textit{knowledge uncertainty}, coming from the lack of information about a given region of the feature space. Two approaches for generating ensembles are considered: Stochastic Gradient Boosting (SGB) and Stochastic Gradient Langevin Boosting (SGLB). Notably, SGLB also enables the generation of a \emph{virtual} ensemble via only one gradient boosting model, which significantly reduces complexity. Experiments on a range of regression and classification datasets show that ensembles of gradient boosting models yield improved predictive performance, and measures of uncertainty successfully enable detection of out-of-domain inputs.


Regression Prior Networks

arXiv.org Machine Learning

Prior Networks are a class of models which yield interpretable measures of uncertainty and have been shown to outperform state-of-the-art ensemble approaches on a range of tasks. However, Prior Networks have so far been developed only for classification tasks. The properties of Regression Prior Networks are demonstrated on synthetic data, selected UCI datasets, and two monocular depth estimation tasks. They yield performance competitive with ensemble approaches. However, in order to improve the safety of AI systems (Amodei et al., 2016) and avoid costly mistakes in high-risk applications, such as self-driving cars, it is desirable for models to yield estimates of uncertainty in their predictions. Ensemble methods are known to yield both improved predictive performance and robust uncertainty estimates (Gal & Ghahramani, 2016; Lakshminarayanan et al., 2017; Maddox et al., 2019). Importantly, ensemble approaches allow interpretable measures of uncertainty to be derived via a mathematically consistent probabilistic framework. Specifically, the overall total uncertainty can be decomposed into data uncertainty, or uncertainty due to inherent noise in the data, and knowledge uncertainty, which is due to the model having limited uncertainty of the test data (Malinin, 2019). Uncertainty estimates derived from ensembles have been applied to the detection of misclassifications, out-of-domain inputs and adversarial attack detection (Carlini & Wagner, 2017; Smith & Gal, 2018), and active learning (Kirsch et al., 2019). Unfortunately, ensemble methods may be computationally expensive to train and are always expensive during inference.